Evaluating Web Content Quality via Multi-scale Features

نویسندگان

  • Guanggang Geng
  • Xiao-Bo Jin
  • Xinchang Zhang
  • Dexian Zhang
چکیده

Web content quality measurement is crucial to various web content processing applications. This paper will explore multi-scale features which may affect the quality of a host, and develop automatic statistical methods to evaluate the Web content quality. The extracted properties include statistical content features, page and host level link features and TFIDF features. The experiments on ECML/PKDD 2010 Discovery Challenge data set show that the algorithm is effective and feasible for the quality tasks of multiple languages, and the multi-scale features have different identification ability and provide good complement to each other for most tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing new features of infected web content in detection of malicious web pages

Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...

متن کامل

Development of design criteria and evaluation scale for web-based learning platforms

Standardized and objective design criteria for evaluating web-based learning platforms can effectively distinguish the quality of a platform and, therefore, contribute in improving web-based learning outcomes. This is a two-phase study, in which Delphi technique and heuristic evaluation were employed in the first phase to develop the evaluation criteria and scale of web-based learning platforms...

متن کامل

On the Question of How Web 2.0 Features Support Critical Map Reading

Web 2.0 technologies enable users to produce and distribute their own content. The variety of motives for taking part in these communication processes leads to considerable differences in levels of quality. While social media contexts have developed features for evaluating contributions, user-generated maps frequently do not offer tools to question or examine the origin and elements of user-gen...

متن کامل

Specifying Quality Requirements for the Web 2.0 Applications

To specify quality requirements for Web 2.0 applications we propose an integrated approach which considers features for contents, functionalities and services. In this work we discuss how to model internal quality, external quality and quality in use views taking into account not only the software characteristics – as those specified in the ISO 9126-1 quality modelsbut also the own features to ...

متن کامل

A Framework for Summarization of Multi-topic Web Sites

Web site summarization, which identifies the essential content covered in a given Web site, plays an important role in Web information management. However, straightforward summarization of an entire Web site with diverse content may lead to a summary heavily biased to the dominant topics covered in the target Web site. In this paper, we propose a two-stage framework for effective summarization ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1304.6181  شماره 

صفحات  -

تاریخ انتشار 2013